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ABSTRACT 

The effect of dimensionality on an adaptive test's 
ability estimation was examine j. Two-dimensional data sets, which 
differed from one another in the interdimensional ability 
association, the correlation among the difficulty parameters, and 
whether the item discriminations were or were not confounded with 
item difficulty, were generated for 1,600 simulated examinees. The 
generated data were used for Bayesian computerized adaptive testing 
(CAT) simulations (three-parameter logistic modPl), and the CAT 
ability estimates we compared with the simulated examinees 1 known 
abilities. The dimensionality of response data shifted the focus for 
the minimization of measurement errors from known abilities (with 
unidimensional data) to the average of the latent abilities (with 
bidimensional data) . Three tables and 24 graphs summarize the study 
results. (Author/SLD) 
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ABSTRACT 



This study examined the effect of dimensionality on an adaptive test's ability estimation. 
Two-dimensional data sets were generated which differed from one another in the interdi- 
mensional ability association, the correlation among the difficulty parameters, and 
whether the item discriminations were or were not confounded with item difficulty. The 
generated data were used for Bayesian CAT simulations (three-parameter logistic model) 
and the CAT ability estimates were compared with the the simulees known abilities (0t s ) 
Results show that ft v dimensionality of the response data shifts the focus for the mini- 
mization of measurement errors from Sj (with unidimensional data) to the average of the 
latent abilities (with bidimensional data). 
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Computerized adaptive testing (CAT) is concerned with the minimization of mea- 
surement errors in the estimation of an examinee's ability. To achieve this goal the exam- 
inee is administered items based on his or her current ability estimate. These items are 
selected such that the examinee is expected to have about a fifty percent chance of cor- 
rectly answering the items. Some of CAT's benefits include equiprecise measurement 
throughout the ability continuum and adaptive tests which are shorter than the corre- 
sponding paper-and -pencil tests. 

CATs typically are based on one of the dichotomous unidimensional IRT models, 
such as the three-parameter logistic (3PL) or Rasch models (e.g., McBride & Martin, 1983; 
Kingsbury & Houser, 1988). The development of the CAT item pool requires the identifi- 
cation of the data's dimensionality before fitting the IRT model. That is, although some 
items may be considered unidimensional, other test items may require more than one 
ability to obtain a co.Tect response. For instance, correctly answering a mathematical 
wor<* problem may be considered to be a function of reading and mathematical abilities. 
Implications of the violation of unidimensionality for CAT item pool development (e.g., 
equating, scale shrinkage) may be found in Doody-Bogan and Yen (1983) as well as in Yen 
(1985). 

Multidimensional models have been developed in order to address the issue of 
multiple latent dimensions (e.g., McKinley & Reckase, 1983; Sympson, 1978). These 
models are classified as either compensatory or noncompensatory. Conceptually, a com- 
pensatory model is one in which an examinee's latent traits interact to produce ? response 
to an item. This interaction may take the form of an examinee's facility in one latent trait 
(0) compensating for a deficiency in another 0. In contrast, in a nc jmpensatory model 
the examinee's 6s do not interact to yield a response. Although these models have been 
used in some research they have yet to obtain widespread acceptance or use ir applica- 
tions. 
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Given that the dimensionality assumption of unidimensional IRT subsumes the 
principle of local independence (Lord, 1980), violation of this assumption should affect 
the likelihood function used for parameter estimation. A number of studies (e.g., 
Ackerman, 1989; Way, Ansley, & Forsyth, 1988; Ansley & Forsyth, 1985; Reckase, 1979) 
have examined the effect of multidimensional response data on unidimensional IRT 
parameter estimates. These studies have been piimarily concerned with the effects of 
dimensionality on the calibration of a multidimensional data set by either LOGIST 
(Wingerskey, Barton, & Lord, 1982) or BILOG (Mislevy & Bock, 1982). Although the 
models used for data generation differed, the results of these studies have found that 
dimensionality affects parameter estimation. \a general, when a compensatory multidi- 
mensional IRT model was used for data generation 6 was found to be an estimate of the 
average of the true bs (Way et al., 1988), £ was an estimate of the sum of aj and ai (Way et 
al., 1988), and ability estimates 6 to be an estimate of the average true 6s (Ackerman, 
1989; Way et al., 1988). In contrast, data generation using a noncompensatory model 
showed thai £ was an overestimate of or correlated more highly with bj than with b2 
(Ackerman, 1989; Way et al., 1988; Ansley & Forsyth, 1985), a was an estimate of the 
average of the true as (Way et al„ 1988; Ansley & Forsyth, 1985), and d to be i an estimate 
of the average true 9s (Ackerman, 1989; Way et al., 1988; Ansley & Forsyth, 1985). In 
general, these conclusions come from correlational analyses of the parameters with their 
estimates and an assessment of the accuracy of parameter estimation by the calculation of 
the mean absolute difference (a.k.a., MAD or AAD) across whichever was pertinent, 
examinees or items. 

In general, studies which have investigated the operating characteristics of CAT 
have involved the simulation of unidimensional data and item pools (e.g., Weiss, 1982; 
McBride, 1977; Jenseina, 1974). However, given "...that no actual psychological measure- 
ment instrument is likely to be exactly unidimensional../ the issue becomes ore of 
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whether the "...instrument is sufficiently unidimensional to allow application of IRT 
(Hulin, Prasgow, & Parsons, 1983, p. 40). In live testings, where the possibility of less 
than ideal unidimensional data may exist, the primary concern has been with the estima- 
tion of the reliability and validity of CAT (e.g., M^Bride & Martin, 1983; V'ciss & 
Kingsbury, 1984). Further, because in these studies the examinee's true ability is 
unknown the influence of dimensionality on the accuracy of ability parameter estimation 
cannot be investigated. 

This study investigated the effect of varying degreee of dimensionality on CAT 
ability estimation. That is, an adaptive test based on unidimensional item parameter was 
administered to an simulee who used more than one ability to respond. Two-dimensional 
data sets were generated which differed from one another in the interdimensional ability 
association, the correlation among the difficulty parameters, and whether the item 
discriminations were or were not confounded with item difficulty. This latter factor is 
included because of Reckase, Carlson, Ackerman, an 1 Spray's (1986) finding that upper 
deciles of a tridimensional ability differ mainly on 62 while at lower deciles the ability 
differed primarily on B\ (cited in Ackerman, 1989). Simulees with known abilities were 
administered unidimensional i~sts and their abilities estimated on the basis of their 
multidimensional responses. In contrast to the studies mentioned above (i.e., Ackerman, 
1989; Way et al., 1988; Ansley & Forsyth, 1985), the accuracy and bias of the 6s at 
various points along the abiHty continuum was assessed. 

METHOD 

Data : The data were generated according to a multidimensional 3PL (M3PL) model (Doody 
Bogan & Yen, 1983). This model requires a set of multidimensional 6s as well as a set of 
(multidimensional) item parameters. The multidimensional 6 s were generated such that 
the examinee's ability on dimension 1 (6i) was evenly distributed between -3.0 and 3.0 
using 0.4 logit interval between successive 6 levels (i.e., for 100 examinees 6i =-3.0, for 
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100 examinees 61 =-2.6, etc.). The examinee's ability on the second dimension (92) was 
derived from 6i by using Hoffman's (1959) technique for generating correlated data. For 
each of the 160f simulees 62 was obtained by randomly sampling a normal deviate (Z) from 
a unit normal curve and calculating : 

e2=0i + (k/r)Z (1); 
where k=V 1-r 2 , and r is the desired intercorrelation between 9] and 62. Four 
interdimensional 6 correlations (re^) were investigated from extreme ^dimensionality 
to almost unidimensionality; values for re ^2 were 0.03, 0.30, 0.60, 0.90. 

In the following an item parameter's subscript refers to a dimension. The 
difficulty parameters (bj and ^2) were generated in a fashion analogous tc the generation 
of 8* and 62- That is, the bj for sets of f3ur items was fixed at every 0.1 logit between - 
3.5 and 3.5 (e.g., for 4 items bj= -3.5, for 4 items J>; = -3.4, etc.). The bi for each of the 
284 items was derived from the item's b\ u:tng the correlated generation method 
mentioned above. Three b \bi correlations (rj,;^) were used in the study, 0.03, 0.60, 
and 0.90. 

The discrimination parameters (a; and 02) were created by randomly sampling 
from a uniform distribution with a minimum value of 0.20 and a maximum value of 1.8. 
This set of a? was combined with the three sets of bs to form three item pools where all 
item pools had the same set of as; this combination of the randomly ordered as with the bs 
form form the nonconfounding condition. The confounding between as and bs was obtained 
by sorting aj into ascending order and sorting a2 into descending order (cf., Ackerman, 
1989). The pseudo-guescing parameter, c, was set tc 0.20. 

The interdimensional correlations of 0.30, 0.60, and 0.90 were obtained from the 
literature (Ackerman.1989; Way et al.,1988; Ansley and Forsyth, 1985); the reje 2 = 0.03 

was used as an approximation to ro]G 2 = 0-0 because this latter value could not be used 

with the Hoffman's technique. The iblbl s 0.03 was obtained from Yen (1985), whereas 
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the Tbjb2 = 060 ( r */*2 2 =0 36) and *blb2 ■ 0.90 (rj,/&2 2 =0.81) were used to simulate 
moderate and high linear relationships. The minimum and maximum as are the same as 
those in Ackerman (198>). The constant used tor c came from Way et al. (1983). 

To summarize, the data generation was based on 6 different combinations of item 
parameters (3 levels of xb\b2 by 2 levels of confounding) and four levels of interdimen- 
sional ability association. The crossing of these three factors produced 24 response data 
sets. For each data set the true 67 s plus the relevant 284 true item parameters were used 
to generate binary response strings with a random error component for each simulated 
examinee. Generation of the binary resporse strings was accomplished by calculating for 
a given 67 pair and a given item the probability of a correct response according to the 
M3PL model. To create the random error component for a response, a random number was 
selected from a uniform distribution [0,1] and compared to the calculated probability. If 
the random number was less than or equal to the calculated probability, then a response of 
1 was produced (a correct answer), otherwise a 0 was generated (an incorrect response). 
Program : A computer program was written that simulated a CAT based on the 3PL mode! 
and which used Bayesian ability estimation with Owens Bayes updating (i.e., Jensema's 
(1974) alpha technique) for item selection. The adaptive testing simulation was termi- 
nated when either of two criteria were met : a maximum of thirty items was reached or 
when a standard error of estimate (SEE) of 0.05 or less was obtained. 

A unidimensional item pool was created for use with the Bayesian CAT. 
Discrimination, difficulty, and pseudo-guessing parameters were generated for 284 items. 
The discrimination (a) and pseudo-guessing (c) parameters were generated by random 
sampling from a uniform distribution with the following restrictions : (a) a were 
restricted to the inclusive range of 0.80..2.00; and (b) c were allowed to vary between 0.00 
and 0.20. The difficulty parameters (b) were uniformly distributed between -3.5 and 3.5 
(inclusive) with four items at each 0.10 of an interval (i.e., there were four items with b= 
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•3.5, four hens with ft= -3.4, etc.). The use of multiple items at each 0.1 interval was 
done to ensure that items of appropriate difficulty would always be available for the 
Bayesian CATs ability estimation. These item parameter? values are consistent with 
desirable item pool characteristics (Patience and Reckase, 1980; Urry, 1977). Therefore, 
to each of the 1600 examinees in each of the 24- multidimensional response data sets a 
Bayesian CAT was administered. 

Analyses : Analysis of the CAT simulations involved using root mean square error (RMSE), 
bias, and correlations (Pearson product-moment, Spearman rank-order) between the 6 and 
01. 02» and between d and the average of 6] and 02 (&)- Descriptive statistics were 
calculated on the number of items administered, the ds as well as on various item pool 
characteristics. 

RESULTS 

For the 0.03, 0.30, 0.60, and 0.90 interdimensional ability conditions the observed 
correlations were -0.028, 0.303, 0.590, and 0.964, respectively. Table 1 shows the item 
parameters* interdimensional correlations for the confounded and nonconfounded condi- 
tions. As can be seen, for the desired rbjb2 o{ 0 03 » °- 60 « and °- 90 lhc observed correla- 
tions were 0.095, 0.678, and 0.946. In addition, for the confounded conditions the corre- 
lation between a\ and b\ approached -1.00 and between a 2 and b\ approximated 1.00. The 
unidimensional item pool used for the CAT simulations had an average a of 1.410 (median 
of 1.421) and a mean c of 0.102 (rrlian=0.101). The Pearson product-moment correlation 
between a and b for the unidimensional item pool was 0.077 (Spearman rank-order was 
0.076). 

Insert Table 1 about here 

Table 2 shows the correlational analyses between d and 0], 62, and ? for the 
nonconfounded conditions. As can be seen, for each level of the *b\b2 f actor *he 
association between CAT 6 and 8 1 and with 62 became increasingly stronger as re 182 
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increased. The intercorrelation between bs appeared to have a slight effect on the corre- 
lation between § and 9i and between 6 and 92. Further, there was a slight decrease in the 
average number of items administered with increasing intercorrelation between the bs. 
Although for the r&jj^ = 0.90 and reje 2 = 0.90 conditions there were minimal 
differences between rfle. rfoj, and r$e2» for all combinations of die *b\b2 311(1 r 9l92 
factors the linear association between 0 and 5 was greater than for either rfoi or re- 
insert Table 2 about here 

Figure 1 shows the RMSE analysis for the three levels of rbjbi and * e r 9l92 = 
0.03 and re ^2 = O- 90 conditions; the differences in the plotted 9 values reflect the differ- 
ences in the re ^2 conditions. As can be seen, the RMSE with respect to ff was less than 
that of the RMSE of either 9i or 92 for all nonconformed conditions. In fact, the RMSE 
with respect to S for the re j 92 = 0 90 condition is comparable to RMSE for when re ^2 = 
0.03, re 182 = and r 9i8 2 ■ °-60; the RMSE plots for these latter two conditions are the 
intermediate steps in the progression from re \B2 = 0 03 RMSE Pl° ts to those of reje2 = 
0.90. The RMSE with respect to 5 decreased slightly as *blb2 increased. As re ^2 
increased, the RMSE of 9i or 92 approached that of ff. 

Insert Figure 1 about here 

As would be expected from a Bayesian CAT, the CAT overestimated low ability on 
81 and 92 (i.e., Bj < -2.0) and underestimated high ability on 9i and 92 (i.e., Qj > 2.0); 

Figure 2 shows the bias plots for the nonconfounded conditions presented in Figure 1. As 
re j e2 increased the bias with respect to Gi and 92 decreased. For all combinations of the 
rei92 and T blb2 factors, minimal bias was obtained when 6 was considered an estimate of 
8. The *blb2 factor does not appear to have a meaningful effect on bia* for 9], 92, and 5. 

Insert Figure 2 about here 
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Table 3 presents the results from the confounded conditions. As was the case with 
the nonconfounded condition, £ is more highly related to ff than to either Oi or 92. In 
general, r^e i tends to be larger than r^2 for re values of 0.03 and 0.30, whereas for 
the r 8i02 = 0-60 and ^9 162 = 0-^0 conditions the opposite is true. Unlike the 
nonconfounded condition, when re ^2 = 0.90 the r&>2 and r fle correlations are more 
similar to one one another and higher in magnitude than rflej. Further, for all 
combinations of the *b\b2 and r e ^2 factors the average test length in the confounded 

condition was slightly less than the corresponding nonconfounded condition test length. 
The pattern of decreasing test length with increasing ty;/»2 association was not as evident 

with the confounded condition as it was under the nonconfounded condition. 

Insert Table 3 about here 

Inspection of the confounded conditions' RMSE plots showed the same relationship 
between d, ff, Gi.and 62; Figure 3 contains the confounded condition sample RMSE plots for 
the same conditions presented in Figure 1. For the *blb2 ■ 0.03 and Tb]b2 = °-60 
conditions and for the approximate range -2.0 < 9 < 2.0, the RMSE for the confounded 
conditions are lower than those for the nonconfounded conditions, regardless of the re]e2 
condition; as Tbib2 increases the difference in RMSEs diminishes. As was the case for 
the nonconfounded condition, the RMSE of 62 was less than that of 9 \ for high ability 
examinees for the *b\b2 ■ 0-0^ condition. In contrast to the nonconfounded condition, 
the RMSE with respect to B\ was less for lower ability examinees than that of the RMSE 62- 
For all combinations of interdimensional ability and difficulty association the RMSE of 6 
was less than that of 6] and 62. 

Insert Figure 3 about here 

Figure 4 presents the corresponding bias plots to those in Figure 2 , s, but for the 
confounded condition. As can be seen, compared to the nonconfounded condition there was 

o 11 
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less bias for 6] at low abilities, but no meaningful difference at upper abilities. Although 
at the *b\b2 ■ 0.03 and *b\b2 ■ 060 conditions there is no difference between the 
confounded and nonconfounded conditions in bias with respect to 62* for the *b\b2 ~ 0-90 
there was an increase in bias for 6< -1.0. For all interdimensional difficulty levels there 
was an increase in bias for 62 in the 6 range 1.0 to 3.0. In general, as increased this 

pattern was evident, although with decreasing levels of bias in the estimation of 8 1 and 62. 
As was the case with the nonconfounded condition, the bias in 6 with respect to 5 was less 
than that of estimating either 61 or 62. except when re ^2 ■ 0.90. In this latter condition, 

the differences in bias with respect to 61,62 and ff, may not be considered meaningful by 
some; for this '6162 condition there does not appear to be any difference in bias between 
the confounded and nonconfounded conditions. For re ^2 B 0.90 and regardless of Tb]b2 
level, the CAT overestimated low ability more than it underestimated high ability. 

Insert Figure 4 about here 

As stated above, for the nonconfounded condition there was a slight decrease in the 
average number of items administered with increasing Tbjb2* although this pattern was 

not as evident with the confounded condition. Calculation of the average number of items 
administered at each of the 16 levels of 6 showed that, in general, shorter tests were 
administered for 6 < 0.0 (e.g., average test lengths of 15-16 items depending on the condi- 
tion) to longer tests for 6 > 2.0 (e.g., mean test lengths of 17-20 items depending on 
particular data set; the xb\b2 s ° 03 ' r 6i62 ■ 0.90 condition had an atypical mean test 
length of 22 items f or G = 3.0). With increasing *b\b2 and r 6i©2 * c mCan lcsl lengths 
became less variable across 6. Of the 38,400 adaptive tests simulated the absolute maxi- 
mum and minimum test lengths were 28 and 11 items, respectively. 

Conclusion and Discussion 
In general, increasing interdimensional difficulty association produced a slight 
decrease on test length and an increase in the accuracy of ability estimation as assessed 
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by RMSE. The associations oetween 6 and ff, 6], and 62 increased as the correlation 
between interdimensional difficulties and interdimensional anility increased. The 
largest associations were between d and 5; 0.957 and 0.961 for the nonconfounded and 
confounded conditions, respectively. For comparative purposes, a Bayesian CAT 
(maximum test length of 20 items and termination SEE of 0.05) using a unidimensional 
data set (generated according to the 3PL model and using this item pool) had a r$e of 0.988 

(for both Pearson and Spearman coefficients) and an average test length of 15.613. 

When discrimination was confounded with difficulty, the ability estimates showed 
a differential association with one of the two latent traits, however, the correlation 
between d and 5 was always greater than that of rfoj and rfo^. For all combinations of the 

T b]b2 r 0]62 factors the correlation between 6 and 8 for the confounded condition was 
always greater than for the correlation for the corresponding nonconfounded condition. 

From the results of the studies on the effects of dimensionality on the calibration 
of compensatory multidimensional data it may be hypothesized that the finding that £ was 
an estimate of the average true 8s was, in part, a result of the fact that B was an estimate of 
the average of the true b%. That is, because b and * are on the same scale, when the 
separate dimensions are collapsed in the estimation of b % the subsequent stage of 
est'matir ' will also reflect the collapsed difficulty scale; ooth BILOG and LOGIST obtain 
#s prior to estimating 6. However, given that in CAT the item parameters are assu ned true 
ihen the collapsing of the two difficulty scales does not account for 6 being an estimate of 
the average 67s. 

Conceptually, the item pool may be considered to have come from the calibration of 
a unidimensiona'i data set. However, the results should be generalitable to those 
siiuations where item parameters are obtained from data which are not truly unidimen- 
sional (i.e., the situations investigated by Ackerman, 1989; Way, et al , 1988; Ansley & 
Forsyth, 1985). For item selection it is the distribution of b and the magnitude of a and c 
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which are important; CAT makes no distinction with respect to whether £-6 or £ = (/>/+ 
4>2)/2 and £ ■ a or a =a; + a2- 

As stated above, CAT is concerned with minimizing the measurement errors 
associated with the estimation of an examinee's ability. It was shown that the dimen- 
sionality of the response data shifts the focus for the minimization of measurement errors 
from 87 (with unidimensional data) to the average of the latent abilities (with bidimen- 
sional data). Although the results may be considered problematic by some, there may be 
situations where one is only interested in ordering examinees on their ability to perform 
or solve certain types of problems and not in ordering them on the separate latent 
abilities which may be required to solve the problems. For example, on a statistics exam 
the instructor may only be interested in a student's understanding of the appropriateness 
and use of t-tests. The problems may be stated as word problems and require stating the 
appropriate statistical hypotheses, identification of and calculating the relevant t- 
statistic, arriving at co nclusions concerning the truth or falsity of hypotheses, etc. Most 
likely the instructor is not interested in the student's standing on the separate abilities 
required to answer the problem (e.g., his or her reading ability, math ability, etc), but in 
the student's understanding of t-tests. Reckase, Ackerman, and Carlson (1988) have 
concluded that IRT's unidimensionality assumption does not necessarily require test 
items to measure a single ability, but rather the unidimensionality assumption requires 
the test items to measure the same composite of abilities. For this study, this composite 
of abilities was C\c average of 8i and &2- 
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Table 1. Item parameters interdimensional correlations 3 . 
Condition Item 
*blb2 = 003 



Tblb2 ■ 0-90 



Parameter 




bl 


b2 




-0.990 


-0.995 


-0.088 




(-0.032) 


(-0.022) 


(0.055) 


a 2 




0.997 


0.1 12 






(0.014) 


(-0.034) 


bl 






0.095 








(0.095) 


<*1 


-0.990 


-0.997 


-0.677 




(-0.032) 


(-0.022) 


(0.027) 


02 




0.995 


0.671 






(0.014) 


(-0.016) 


bl 






0.678 








(0.678) 


a i 


_n oon 


"U.77 / 


.n qaa 




(-0.032) 


(-0.022) 


(-0.002) 


a 2 




0.995 


0.940 






(0.014) 


'0.002) 


bl 






0.946 








(0.946) 



a Pearson product-moment correlations for confounded and (nonconfounded) conditions. 
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Table 2. Intercorrelations 8 between d and 9 \, 92. 9" and the average number the of items 
administered (Mean NIA) for the nonconfounded conditions. 



Item Pool 
Characteristics 

Tb]b2 r 0i92 
0.03 0.03 

0.30 

0.60 

0.90 

0.60 0.03 
0.30 
0.60 
0.90 

0.90 0.03 
0.30 
0.60 
0.90 



0.500 0.64S 

(0.S18) (0.S20) 

0.611 0.752 

(0.630) (0.660) 

0.741 0.816 

(0.751) (0.825) 

0.890 0.893 

(0.890 ) (0.880) 

0.513 0.694 

(0.539) (0.564) 

0.678 0.779 

(0.697) (0.686) 

0.799 0.849 

(0.801) (0.863) 

0.922 0.929 

(0.921) (0.915) 

0.552 0.723 

(0.562) (0.601) 

0.727 0.794 

(0.734) (0.707) 

0.829 0.874 

(0.823) (0.886) 

0.945 0.251 

(0.940) (0.935) 



Mean NIA 
(SD NIA) 



0.821 17.206 

(0.785) (2.577) 

0.844 17.187 

(0.825) (2.649) 

0.873 17.146 

(0.854) (2.587) 

0.900 17.408 

(0.891) (2.773) 

0.866 16.896 

(0.840) (2.390) 

0.903 16.878 

(0.888) (2.440) 

0.924 16.703 

(0.902) (2.442) 

0.934 16.926 

(0.924) (2.543) 

0.914 16.407 

(0.889) (2.249) 

0.942 16.323 

(0.925) (2.243) 

0.955 16.265 

(0.928) (2.257) 

0.957 16.428 

(0.942) (2.408) 



a Pearson product-moment correlation coefficient (Spearman rank-order correlation 
coeffici ent) 
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Table 3. Intercorrelaticns a between d and 6], 62, 5 and the average number the of items 
administered (Mean N»A) for the confounded conditions. 



Item Pool 
Characteristics 



0.03 



0.60 



0.90 



Mean NIA 
(SD NIA) 



re 182 










0.03 


0.628 


0.623 


0 897 


16 433 




(0.674) 


(0.437) 


(0.851) 


(2.213) 


0.30 


0.692 


0 76ft 

v • f MO 




10.007 




(0.747) 


(0.617) 


(0 886) 


(2 379) 


0.60 


0.763 


0.8S0 


0 90S 

U.7UJ 


1O1 1 U J 




(0.801) 


(0 836) 


(0 883) 


(2 495) 


0.90 


0.907 


0.921 


0 922 


17 194 




(0.907) 


(0.910) 


(0.909) 


(3.178) 


0.03 


0.663 




0 920 


1 U. J J # 




(0.720) 


(0.422) 


(0.877) 


(2.300) 


0.30 


0.760 


0.750 




1 6 291 




(0.797) 


(0.614) 


(0.923) 


(2.209) 


0.60 


0.834 


0.856 


0.948 


16.276 




(0.8S0) 


(0.851) 


(0.916) 


(2.214) 


0.90 


0.928 


0.942 


0.943 


16.730 




(0.927) 


(0.933) 


(0.930) 


(2.582) 


0.03 


0.704 


0.608 


0.941 


15.917 




(0.749) 


(0.418) 


(0.907) 


(2.110) 


0.30 


0.787 


0.756 


0.956 


16.066 




(0.802) 


(0.635) 


(0.942) 


(2.085) 


0.60 


0.840 


0.874 


0.961 


16.031 




(0.84S) 


(0.868) 


(0.923) 


(2.056) 


0.90 


0.944 


0.955 


0.958 


16.230 




(0.939) 


(0.945) 


(0.941) 


(2.235) 



a Pearson product-moment correlation coefficient (Spearman rank-order correlation 
coefficient) 
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Figure Captions 

Figure 1. RMSE analysis for the nonconfounded condtions for *b\b2 = 003 » r *7*2 = ° 60, 
rb]b2 = 090 mi T *\*2 = ° 03 * r 0i62 = °- 90 - 
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RMSE RMS£ 
Ability r=0.03; Difficulty r*0.03: Noacounfounded Ability r«0.90; Difficulty r*0.03; Nonconfounded 




Tkcu Theu 




RMSE 

Abiiiiy r»0.03; Difficulty r*0.90; None on founded 



RMSE 

Ability r«0.90; Difficuliy ;.--0.90; Nonconfoundcd 




RMSEThcu I 
RMS£TWu2 
RMSE A*g That 



I ' ' » I i I » I ■ I ' I ' I ■ I 
J *4 -J -2 •! 0 1 2 3 4 5 

Tktu 




I ■ I 
»l 0 
Ttaia 



RMSCTku I 
KMSE7Vu2 
kMS£Av*Thcu 



ERLC 



21 



Dimensionality and CAT estimation 



Figure Captions 

Figure 2. Bias analysis for the nonconfounded condtions for rj,^ = °- 03 » r */*2 = °- 60 * 
r*/*2 " °- 90 md ««id2 = 003 « r «ie2 " 090 - 
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Figure Captions 

Figure L RMSE analysis for the confounded condtions for rj,;^ ■ 0 - 03 » T bjb2 s °- 60 » 
rbib2 * 0-90 and reje 2 = 0.03, re^ * °- 90 - 
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Figure Captions 



Figure 4. Bias analyst for the confounded condtions for rfc 7 fr 2 = oos » r bjb2 = 060 » 
tb]b2 = 0.90 and re ^2 s 0.03. reje 2 = 0.90. 
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